Emotion
EEVR: A Dataset of Paired Physiological Signals and Textual Descriptions for Joint Emotion Representation Learning
EEVR (Emotion Elicitation in Virtual Reality) is a novel dataset specifically designed for language supervision-based pre-training of emotion recognition tasks, such as valence and arousal classification. It features high-quality physiological signals, including electrodermal activity (EDA) and photoplethysmography (PPG), acquired through emotion elicitation via 360-degree virtual reality (VR) videos. Additionally, it includes subject-wise textual descriptions of emotions experienced during each stimulus gathered from qualitative interviews. The dataset consists of recordings from 37 participants and is the first dataset to pair raw text with physiological signals, providing additional contextual information that objective labels cannot offer. To leverage this dataset, we introduced the Contrastive Language Signal Pre-training (CLSP) method, which jointly learns representations using pairs of physiological signals and textual descriptions. Our results show that integrating self-reported textual descriptions with physiological signals significantly improves performance on emotion recognition tasks, such as arousal and valence classification. Moreover, our pre-trained CLSP model demonstrates strong zero-shot transferability to existing datasets, outperforming supervised baseline models, suggesting that the representations learned by our method are more contextualized and generalized. The dataset also includes baseline models for arousal, valence, and emotion classification, as well as code for data cleaning and feature extraction.
FindingEmo: An Image Dataset for Emotion Recognition in the Wild Laurent Mertens 1,2 Hans Op de Beeck 3 Jan Van den Stock
We introduce FindingEmo, a new image dataset containing annotations for 25k images, specifically tailored to Emotion Recognition. Contrary to existing datasets, it focuses on complex scenes depicting multiple people in various naturalistic, social settings, with images being annotated as a whole, thereby going beyond the traditional focus on faces or single individuals. Annotated dimensions include Valence, Arousal and Emotion label, with annotations gathered using Prolific. Together with the annotations, we release the list of URLs pointing to the original images, as well as all associated source code.
EEVR: A Dataset of Paired Physiological Signals and Textual Descriptions for Joint Emotion Representation Learning
EEVR (Emotion Elicitation in Virtual Reality) is a novel dataset specifically designed for language supervision-based pre-training of emotion recognition tasks, such as valence and arousal classification. It features high-quality physiological signals, including electrodermal activity (EDA) and photoplethysmography (PPG), acquired through emotion elicitation via 360-degree virtual reality (VR) videos. Additionally, it includes subject-wise textual descriptions of emotions experienced during each stimulus gathered from qualitative interviews. The dataset consists of recordings from 37 participants and is the first dataset to pair raw text with physiological signals, providing additional contextual information that objective labels cannot offer. To leverage this dataset, we introduced the Contrastive Language Signal Pre-training (CLSP) method, which jointly learns representations using pairs of physiological signals and textual descriptions. Our results show that integrating self-reported textual descriptions with physiological signals significantly improves performance on emotion recognition tasks, such as arousal and valence classification. Moreover, our pre-trained CLSP model demonstrates strong zero-shot transferability to existing datasets, outperforming supervised baseline models, suggesting that the representations learned by our method are more contextualized and generalized. The dataset also includes baseline models for arousal, valence, and emotion classification, as well as code for data cleaning and feature extraction.
AI humanoid robot learns to mimic human emotions and behavior
Ready for a robot that not only looks human but also acts and reacts like one, expressing emotions like shyness, excitement or friendliness? Disney Research, the innovation powerhouse behind The Walt Disney Company, has turned this into reality. Its latest creation is an autonomous humanoid robot that can mimic human emotions and behaviors in real time. Think of it as a real-life WALL-E, but with even more personality. This groundbreaking robot uses advanced artificial intelligence to replicate natural gestures and deliberate actions with striking accuracy.
To Err Like Human: Affective Bias-Inspired Measures for Visual Emotion Recognition Evaluation Jufeng Yang
Accuracy is a commonly adopted performance metric in various classification tasks, which measures the proportion of correctly classified samples among all samples. It assumes equal importance for all classes, hence equal severity for misclassifications. However, in the task of emotional classification, due to the psychological similarities between emotions, misclassifying a certain emotion into one class may be more severe than another, e.g., misclassifying'excitement' as'anger' apparently is more severe than as'awe'. Albeit high meaningful for many applications, metrics capable of measuring these cases of misclassifications in visual emotion recognition tasks have yet to be explored. In this paper, based on Mikel's emotion wheel from psychology, we propose a novel approach for evaluating the performance in visual emotion recognition, which takes into account the distance on the emotion wheel between different emotions to mimic the psychological nuances of emotions. Experimental results in semi-supervised learning on emotion recognition and user study have shown that our proposed metrics is more effective than the accuracy to assess the performance and conforms to the cognitive laws of human emotions.
Hybrid Emotion Recognition: Enhancing Customer Interactions Through Acoustic and Textual Analysis
Wewelwala, Sahan Hewage, Sumanathilaka, T. G. D. K.
Sahan Hewage Wewelwala School of Computing Informatics Institute of Technology Colombo 06, Sri Lanka sahanwewelwala@gmail.com T.G.D.K. Sumanathilaka Department of Computer Science Swansea University Swansea, Wales, United Kingdom deshankoshala@gmail.com Abstract -- This research presents a hybrid emotion recognition system integrating advanced Deep Learning, Natural Language Processing (NLP), and Large Language Models (LLMs) to analyze audio and textual data for enhancing customer interactions in contact centers. By combining acoustic features with textual sentiment analysis, the system achieves nuanced emotion detection, addressing the limitations of traditional approaches in understanding complex emotional states. Rigorous testing on diverse datasets demonstrates the system's robustness and accuracy, highlighting its potential to transform customer service by enabling personalized, empathetic interactions and improving operational efficiency. This research establishes a foundation for more intelligent and human - centric digital communication, redefining customer service standards. The capacity to identify and comprehend emotions effectively is an essential element of human - computer interaction, especially in spoken and written communication.
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations
Li, Yupei, Sun, Qiyang, Murthy, Sunil Munthumoduku Krishna, Alturki, Emran, Schuller, Bjรถrn W.
GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations Y upei Li, Qiyang Sun, Sunil Munthumoduku Krishna Murthy, Emran Alturki, and Bj orn W . Schuller Fellow, IEEE Abstract --Affective Computing (AC) is essential for advancing Artificial General Intelligence (AGI), with emotion recognition serving as a key component. However, human emotions are inherently dynamic, influenced not only by an individual's expressions but also by interactions with others, and single-modality approaches often fail to capture their full dynamics. Multimodal Emotion Recognition (MER) leverages multiple signals but traditionally relies on utterance-level analysis, overlooking the dynamic nature of emotions in conversations. Emotion Recognition in Conversation (ERC) addresses this limitation, yet existing methods struggle to align multimodal features and explain why emotions evolve within dialogues. T o bridge this gap, we propose GatedxLSTM, a novel speech-text multimodal ERC model that explicitly considers voice and transcripts of both the speaker and their conversational partner(s) to identify the most influential sentences driving emotional shifts. By integrating Contrastive Language-Audio Pretraining (CLAP) for improved cross-modal alignment and employing a gating mechanism to emphasise emotionally impactful utterances, GatedxLSTM enhances both interpretability and performance. Experiments on the IEMOCAP dataset demonstrate that GatedxLSTM achieves state-of-the-art (SOT A) performance among open-source methods in four-class emotion classification. These results validate its effectiveness for ERC applications and provide an interpretability analysis from a psychological perspective. I NTRODUCTION Artificial General Intelligence (AGI) represents a key future direction in AI development, with Affective Computing (AC) playing a crucial role in enhancing AGI's ability to interact effectively with humans. Sunil Munthumoduku Krishna Murthy is with CHI - Chair of Health Informatics, MRI, Technical University of Munich, Germany (e-mail: sunil.munthumoduku@tum.de). Bj orn W . Schuller is with GLAM, Department of Computing, Imperial College London, UK; CHI - Chair of Health Informatics, Technical University of Munich, Germany; relAI - the Konrad Zuse School of Excellence in Reliable AI, Munich, Germany; MDSI - Munich Data Science Institute, Munich, Germany; and MCML - Munich Center for Machine Learning, Munich, Germany (e-mail: schuller@tum.de). Y upei Li and Qiyang Sun contributed equally to this work.
Exploring Cultural Nuances in Emotion Perception Across 15 African Languages
Ahmad, Ibrahim Said, Dudy, Shiran, Belay, Tadesse Destaw, Abdulmumin, Idris, Yimam, Seid Muhie, Muhammad, Shamsuddeen Hassan, Church, Kenneth
Understanding how emotions are expressed across languages is vital for building culturally-aware and inclusive NLP systems. However, emotion expression in African languages is understudied, limiting the development of effective emotion detection tools in these languages. In this work, we present a cross-linguistic analysis of emotion expression in 15 African languages. We examine four key dimensions of emotion representation: text length, sentiment polarity, emotion co-occurrence, and intensity variations. Our findings reveal diverse language-specific patterns in emotional expression -- with Somali texts typically longer, while others like IsiZulu and Algerian Arabic show more concise emotional expression. We observe a higher prevalence of negative sentiment in several Nigerian languages compared to lower negativity in languages like IsiXhosa. Further, emotion co-occurrence analysis demonstrates strong cross-linguistic associations between specific emotion pairs (anger-disgust, sadness-fear), suggesting universal psychological connections. Intensity distributions show multimodal patterns with significant variations between language families; Bantu languages display similar yet distinct profiles, while Afroasiatic languages and Nigerian Pidgin demonstrate wider intensity ranges. These findings highlight the need for language-specific approaches to emotion detection while identifying opportunities for transfer learning across related languages.